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AGTCCCAGACGGGCTTTTCCCAGAGAGCTAAAAGAGAAGGGCCAGAGAATGTCGTCCCAG 

CCAGCAGGGAACGAGACCTCCCCCGGGGCCACAGAGGAGTACTCCTATGGCAGCTGGTAC 

ATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGGAAGTGCCCTCCTGCCAC 

ACCAGCATACCACGCGGCCTGTAGCAGGCCTGCCTGGCCTGGCTGTCAATCCTTGTGCTG 

GTGCTCCTGGCCATGGTGGTGAGGCGCCGCCAGCTCTGGCCTGACTGTGTGCGTGGCAGG 

GCCGGCGTGGCCAGCCGTGTGGATTTCTTGGCTGGGGAGAGGCCCCGGGGAGTGCCTGCT 

GCTGTTTTCATGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCGCGACGAGGACGCATTG 

CGCTTCCTGACTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGG 

GCCTGGAAGATACTGGGACTGTTCTATTATGCTGCCCTCTAGTAGGCTCTGGCTGCCTGT 

GCCACGGGTGGCCACACAGCTGCACACCTGCTCGGCAGGACGCTGTCCTGGGCGCACCTT 

GGGGTCCAGGTGTGGCAGAGGGCAGAGTGTCCGCAGGTGCCCAAGATCTACAAGTACTAG 

TCCCTGCTGGCCTCCGTGCCTGTCCTGCTGGGCCTGGGATTGCTGAGCCTTTGGTACCCT 

GTGCAGCTGGTGAGAAGCTTCAGCGGTAGGACAGGAGGAGGCTCGAAGGGGCTGCAGAGG 

AGCTACTCTGAGGAATATGTGAGGAACCTGCTTTGCAGGAAGAAGCTGGGAAGGAGCTAC 

GACAGGTCCAAGCATGGGTTCCTGTCCTGGGCGCGCGTGTGCTTGAGACACTGCATCTAG 

ACTCGACAGCCAGGATTGCATCTCCCGCTGAAGCTGGTGCTTTGAGGTAGACTGACAGGG 

AGGGCGATTTACCAGGTGGGCCTGCTGCTGCTGGTGGGCGTGGTACGCAGTATCCAGAAG 

GTGAGGGCAGGGGTCACCACGGATGTGTCGTACGTGGTGGCGGGCTTTGGAATGGTGCTG 

TGCGAGGAGAAGCAGGAGGTGGTGGAGCTGGTGAAGGACCATGTGTGGGGTCTGGAAGTG 

TGCTACATGTCAGCGTTGGTGTTGTGCTGCTTAGTCACCTTGCTGGTCCTGATGCGCTCA 

GTGGTGACACAGAGGACCAACGTTCGAGCTCTGGACCGAGGAGCTGCCCTGGACTTGAGT 

GCCTTGGATGGGAGTGCCCATCCCTGCGGGCAAGCGATATTCTGTTGGATGAGCTTCAGT 

GCGTACCAGACAGGCTTTATCTGCCTTGGGCTCCTGGTGCAGGAGATCATGTTGTTCCTG 

GGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTG 

GTCTTCCGTTCGCTGGAGTCCTCGTGGGCCTTCTGGCTGACTTTGGCCCTGGCTGTGATC 

CTGCAGAACATGGCAGCCCATTGGGTCTTGCTGGAGAGTCATGATGGACACGCACAGCTG 

ACGAAGCGGCGAGTGCTCTATGCAGCCACCTTTCTTGTGTTCCCCGTGAATGTGCTGGTG 

GGTGGGATGGTGGCCACCTGGGGAGTGCTCCTGTCTGCCCTCTAGAAGGCCATCCACCTT 

GGCCAGATGGACCTCAGCCTGGTGGCACCGAGAGGGGCCACTCTGGACCCCGGCTAGTAG 

ACGTACGGAAACTTGTTGAAGATTGAAGTCAGGCAGTCGCATCCAGGCATGACAGCCTTC 

TGCTGCCTGCTCCTGCAAGCGCAGAGGCTCGTACCCAGGAGCATGGCAGCCCCCCAGGAC 

AGCCTGAGAGCAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGAGAAAGGACTCCATG 

GCCAAGGGAGCTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTAGACG 

CTGCTGGAGAACCGAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGT 

GGCCAGCCGTGAGGGGAGGGAAGGTCAAGCCACCTGCCCATCTGTGCTGAGGCATGTTCG 

TGGCTACCATGCTCCTCCCTCCCCGGCTCTCCTCGCAGCATCACAGCAGCCATGCAGCCA 

GCAGGTCCTCCGGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAG 

GGCTGTGCTCCACCCAGTTGGCTATGGGAGAGCCAGGAGGGGTTCTGGAGAAAAAAACTG 

GTGGGTTAGGGCCTTGGTCCAGGAGCCAGTTGAGCGAGGGCAGCCACATCGAGGCGTCTC 

CCTACCCTGGGTGTGCGATGAGCCTTGAAGGGCGTCGATGAAGCGTTCTGTGGAACCACT 

CCAGGCCAGCTCCACCTCAGGCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTGCT 

GACCCCCTCAGCGGCAGGGACCTCTCTGGGGAGTGGCCGGAAAGGTCCCGGTCCTGTGGC 

CTGGAGGGCAGCCCAAGTGATGACTCAGACCAGGTCCCACACTGAGGTGGCGACACTCGA 

GAGCCAGATATTTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTC 

GCTGC AATAAA GTTGTTCCTGAGAAAAAAAAAJVAAAAAAAAAAA^ 

AAJ\AAAJ\AAAAAAAAAAAAAA^ 
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MSSQPAGNQTSPGATEDYSYGSWYIDEPQGGEELQPEGEVPSCHTSIPPGLYRACLASLS 
IbVLLLLMLVRRRQLWPDCTO 

EDALPFLTbASAPSQDGKTEAPPGAWKILGLFYYAAjbYYPLAACATAGHTAAHLLGSTLS 
5 WAJ3LGVQVWQRAECPQVPKIYKYYSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGSK 
GLOSSYSEEYLRNLLCRKKLGSSYHTSKHGFLSWAPVCLPHCIYTPOPGFHLPLKI.VLSA 
TLTGTA3Y0VALLLLVGWPTIQKVRAGVTTDVSYLLAGFGIVLSEDKQE\A^LVKHHLW 
ALEVCY I SALVLS CLLTFLVLMR S LVTHRTN LRALHRGAALDLS PLHR S PH PSR QA I FCW 
MSFS AYQTAFI CLGLLVQQI I FFLGTTALAFLVLMPVLHGRtf LLLFRS LESS WPFWL TLA 
1 0 LAVILQN^1AAHVA/FLETHDGHPQLTNRRVLYAATFLLFPLNVLVGAMVATWRVLLSAL™ 
AIHLGQMDLSLLPPRAATLDPGYYTYR^FLKIEVSQSHPAMTAFCSLLLQAQSLLPRTMA 
APQDSLRPGEEDEGMOLLQTKDSHAKGARPGASRGRARWGLAYTLLHNPTLQVFRKTALL 
GANGAQP 

Important features of the protein: 
15 Signal peptide: 

None 

Transmembrane domain: 

20 

54-69 
102-139 
148-166 
207-222 
25 301-320 
364-380 
431-451 
474-489 
560- 535 

30 

Motif file: 

Motif name: N-glycosylation site. 
8-12 

35 

Motif name: N-myr i s toyl a t ion site. 

50-56 

176-182 
40 241-247 

317-323 

341-347 

525-531 

627-633 
45 631-637 

640-646 

661-667 

Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 

50 

364-375 

Motif name: ATP/GTP-binding site motif A (P-loop) . 
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PRO 



XXXXXXXXXXXXXXX (Length = 15 amino acids) 
XXXXXYYYYYYY (Length = 12 amino acids) 



Comparison Protein 



5 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
10 of the PRO polypeptide) = 

5 divided by 15 = 33.3% 
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5 



PRO XXXXXXXXXX (Length = 10 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length = 15 amino acids) 

% amino acid sequence identity = 



(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN -2) divided by (the total number of amino acid residues 
10 of the PRO polypeptide) = 



5 divided by 10 = 50% 
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PRO-DNA NNNNNNNNNNNNNN (Length - 14 

nucleotides) 

5 Comparison DNA NNNNNNLLLLLLLLLL (Length - 16 

nucleotides) 

% nucleic acid sequence identity = 

10 (the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) — 

6 divided by 14 - 42.9% 

15 
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PRO-DNA NNNNNNNNjNNjNN (Length = 12 nucleotides) 

Comparison DNA NNNNLLLVV (Length = 9 nucleotides) 

5 

% nucleic acid sequence identity — 



(the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA 
1 0 nucleic acid sequence) = 



4 divided by 12 = 33 3% 



FIGURE 4A 



10 



15 



20 



25 



30 



35 



40 



* C-C increased from J 2 to 35 

* Z is average of EQ 

* B is average of ND 

* match with slop is _M; stop-stop = 0; J (joker) match - 0 
V 

^define Jvf -8 /* value of a match with a stop */ 



int 

/* 

/* A 
/* B 
/* C 
/* D 
/* E 
/* F 
/* G 
/* H 



_day[26)I26] - { 
ABCDEFGH 



IJKLMNOPQRS 



/* J */ 

/* ) *! 
/* K V 
/* L */ 
/* M 
/* N */ 
/* O */ 
0,_M 
/* P 3 

/* R ; 

/* s * 

/* T *■ 
/* V - 
/* V ; 
/* W 
/* X ' 
/* Y = 
/* Z * 



T U V W X Y 
1 , 1,0, 0,-6, 0,-3, 



Z */ 



2, 0,-2, 0, 0,-4, 1,-1,-], 0,-1,-2,-1, 0,_M, I, 0,-2, , _ v , 

0, 3,-4, 3, 2,-5, 0, 1,-2, 0, 0,-3,-2, 2,_M,-J, 1 , 0, 0, 0, 0,-2,-5,' o'-3,' 
-2,-4,15,-5,-5,-4,-3,-3,-2, 0,-5,-6,-5,-4 7 _M,-3,-5,-4, 0,-2, 0,-2,-8, 0, 0,-5}, 
0, 3,-5, 4, 3,-6, 1, 1,-2, 0, 0,-4,-3, 2,_M,-1, 2,-1, 0, 0, 0,-2,-7, 0,-4, 2}, ' 

0, 2,-5, 3, 4,-5, 0, 1,-2, 0, 0,-3,-2, 1,_M,-1, 2,-1, 0, 0, 0,-2,-7, 0,-4^ 3}[ 
-4,-5,-4,-6,-5, 9,-5,-2, 1, 0,-5, 2, 0,-4._M,-5,-5,-4,-3,-3, 0,-1, 0, 0, 7,-5}, 

1, 0,-3, 1, 0,-5, 5,-2,-3, 0,-2,-4,-3, 0,_M,-l,-l,-3, J, 0, 0,-1,-7, 0,-5, 0},' 
-I, 1,-3, 1. 1,-2,-2, 6,-2, 0, 0,-2,-2, 2,_M, 0, 3, 2,-1,-1, 0,-2,-3, 0, 0, 2}! 
- J ,-2,-2,-2,-2, 1 ,-3,-2, 5, 0,-2, 2, 2,-2, M,-2,-2,-2,- 1, 0, 0, 4,-5, 0,- 1,-2} , 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}/ 

1, 0,-5, 0, 0,-5,-2, 0,-2, 0, 5,-3, 0, 1,_M,-1, 1, 3, 0, 0, 0,-2,-3, 0,-4, 0), 
-2,-3,-6,-4,-3, 2,-4,-2, 2, 0,-3, 6, 4,-3,_M,-3,-2.-3,-3,- 1 , 0, 2,-2, 0,-1,-2}, 
■1,-2,-5,-3,-2, 0,-3,-2, 2, 0, 0, 4, 6,-2, M, -2,- L 0,-2,-1, 0, 2,-4* 0,'-2*-l} ' 

0, 2,-4, 2, 1,-4, 0, 2,-2, 0, 1,-3,-2. 2,_M,-I, 1,0. 1,0, 0,-2,-4,0,-2,' }}' 

{_M,_M._M,_M,_M,_M,_M M M r M, M M M M 
M,_M,_M,_M,_N1,_M,_M,_M,_M,_M} r " " '~ 

1, -1,-3,-1,- 1,-5,-1, 0,-2, 0,-1,-3,-2,- 1, M, 6, 0, 0, 1, 0, 0,-1,-6. 0,-5, 0}. 

0, 1,-5, 2, 2,-5,-1, 3,-2, 0, 3,-2,-1, J,_M, 0, 4, I,- 1,-1, 0,-2 '-5, 0,'-4,' 3},' 
-2, 0,-4.-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0._M, 0, I, 6, 0,-1, 0,-2, 7 7 o!-4* 0}[ 

3, 0, 0, 0, 0,-3, 1,-1,-1, 0, 0,-3.-2, 1,_M, 1,-1, 0, 2, 1 , 0, 1 ,-2 ' 0,-3, 0),' 

1, 0,-2, 0, 0,-3, 0,-1, 0, 0, 0,-1,-1, 0,_M, 0,-1,-1, 1, 3, 0, 0,-5 T 0,-3^0}! 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0,0, 0,0},' 
0,-2,-2.-2,-2,-1,-1,-2, 4, 0,-2, 2, 2,-2,_M,-l ,-2,-2,- 1 , 0, 0. 4,-6, 0,-2,-2}, 
-6,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3.-2,-4,-4,_M,-6,-5, 2,-2,-5, 0,-6,17, 0, 0,-6}, 
0, 0, 0, 0, 0, 0 7 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}. 
-3,-3, 0,-4,-4, 7,-5, 0,-1, 0,-4,-1,-2, -2,_M, -5, -4,-4, -3, -3, 0,-2, 0, 0,10,-4}, 
0, 1,-5, 2, 3,-5, 0, 2,-2, 0, 0,-2,-1, 1 T _M, 0, 3, 0, 0, 0, 0,-2,-6, 0,-4, 4} 



45 



50 



55 
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include <stdio.h> 






c 

~> 


include <ctypeh> 








^define 


MAXJMP 


16 


/* max jumps in a diag */ 




^define 


MAXGAP 


24 


J* don't continue to penalize g3ps larger than (his */ 




^define 


JMPS 


1024 


/* max jmps in an path */ 


10 


^define 


MX 


4 


/* save if there's at least MX- 1 bases since last jmp : 




^define 


DMAT 


3 


/* value of matching bases */ 




#define 


DM1S 


0 


/* penalty for mismatched bases */ 




^define 


D1NS0 


8 


/* penalty for a gap */ 


15 


^define 


DJNS1 


1 


/* penally per base */ 




^define 


P1NS0 


8 


/* penalty for a gap */ 




^define 


PINS I 


4 


/* penalty per residue */ 




s\rxnt jmp { 






20 




short 


n(MAXJMP]; /* size of jmp (neg for dely) */ 



unsigned short afMAXJMP]; /* base no of jmp tn seq x */ 

/* hmits seq to 2' J6 - 1 */ 



25 



30 



35 



40 



45 



50 



55 



si rut 1 diag { 








in( 


score, 




/* score at last jmp */ 


long 


offset; 




/* offset of prev block *f 


short 


ijmp; 




1* current jmp index */ 


struct 

y 


jmp jp; 




/* list of jmps */ 


struct path { 








int 


spc; 


/* number of leading spaces */ 


short 


nJJMPS);/* size 


of jmp (gap) V 


int 

}; 


xJJMPS],/* ioc of jmp (last 


elem before gap) */ 


char 


*ofi!e; 




/* output file name */ 


char 


*namex|2]; 




/* seq names getseqs() */ 


char 


*prog; 




/* prog name for err msgs */ 


char 


*seqx[2); 




/* seqs getseqs() *f 


int 


dmax; 




/* best diag: mv() */ 


int 


dmaxO; 




/* final diag */ 


int 


dna; 




/* set if dna: main() */ 


int 


endgaps; 




/* set if penalizing end gaps */ 


int 


gapx, gapy; 




/* total gaps in seqs */ 


int 


lenO, lenl; 




/* seq lens */ 


int 


ngapx, ngapy; 




/* total size of gaps */ 


int 


smax; 




/* max score nwQ */ 


int 


*xbm; 




/* bitmap for matching */ 


Jong 


offset; 




/* current offset in jmp file */ 


struct diag 


*dx; 




/* holds diagonals */ 


struct path 


PPI2]; 




/* holds path for seqs */ 


char 


*caIJoc0 7 *maHocO, *indexQ, *strcpyQ; 


char 


*getseq() ? *g_calloc0; 
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/* Needleman-Wunsch alignment program 



10 



* usage: progs file! f>ie2 

* where filel and file2 are two dna or (wo protein sequences. 

* The sequences can be in upper- or lower-case an may contain ambiguity 

* Any lines beginning with * > * or " < * are ignored 

* Max file length is 65535 (limited by unsigned short x in the jmp struct) 

* A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 

* Output is in the file "align. out" 



15 



* The program may create a imp file in /tmp to hold info about traceback. 

* Original version developed under BSD 4.3 on a vax 8650 
*/ 

^Include "nw h" 
^include "day h" 



20 



static 



_dbvall26] = { 

1,14,2,13,0,0,4,1 1,0,0,12,0,3, 15,0,0,0,5,6,8,8,7,9,0,10,0 



25 



static _pbva)|26) - { 

J , 2 |(1 < < ( > D*-*A*))|(1 < <(*N'-'A')), 4, 8, 36, 32, 64, 

128, 256, OxPFFFFFF, J < < 10, 1< < H , 1< < 12, 1< < 13, 1< < 14, 

I< < 15, 1< < 36, 1< < 17, 1< < 38, K < 19, 1 < <20, 1< <2}, } < <22, 

1< <23, 1< <24, J< <25|(I< < ('E'-'A')) j(l< <('Q , ^A > )) 



30 



main(ac, av) 
int 
char 



mam 



ac; 



35 



40 



45 



prog = av|0J; 
ir (ac »= 3) { 

fpr"mtf(stderr, "usage- %s filel file2\n", prog); 

fprintf(stderr, "where filel and fde2 are two dna or two protein sequences \n"). 
fprinif(stderr,"The sequences can be m upper- or lower-case\n"); 
fpnntf(siderr,"Any lines beginning wnh ';' or * < * are ignored\n"); 
fprintf(stderr, "Output is in the file \"align om\"\n"); 
exit(l); 

} 

namex[0] = avjlj; 

namex|l] = av!2J; 

seqx[0] = getseq(namexfO), &Ien0), 

seqx|l] = getseq(namex[I], &Ienl); 

xbm = (dna)? dbval : _pbval; 



50 



endgaps — 0; 

ofile = "align.out"; 



/* I to penalize endgaps *f 
/* output file */ 



nw(); /* fill in the matrix, get the possible jmps */ 

readjmpsO; / * get the actual jmps */ 

print 0; I* P"nt stats y alignment */ 



55 



cleanup(O); 



/* unlink any tmp files *J 
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/* do the alignment, return best score: mainO 

* dna- values in Filch and Smith, PNAS, 80, 1382-1386, 1983 

* pro: PAM 250 values 

5 * When scores are equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 
*/ 

nwO 
10 { 

char *px, *py; /* seqs and ptrs */ 

int *ndely, *dely; /* keep track of defy */ 

int ndelx, delx; /* keep track of delx */ 

int *tmp; /* for swapping jowO, row} */ 

15 int mis; /* score foi each type */ 

int insO, insi, /* insertion penalties */ 

register id; /* diagonal index */ 

register ij, /* jmp index */ 

register *col0, *coll; /* score for curr, last row */ 

2 0 register xx, yy; /* fndex into seqs */ 

dx - (struct diag *)g_c3lloc("to get diags", lenO-f ienl + 1, sizeof(struct diag)); 

ndely = (int *)g_calloc("io get ndely", Jenl + 3, sizeof(int)); 
2 5 dely = (int *)g_calloc("io get defy", lenl + I, sizeof(int)) f 

co)0 = (int *)g_cailoc("to get coJO", Ienl -f 1 7 sizeof(mt)), 
coll - (int *)g_calloc("to get coll Ienl + 1 r sizeof(jnt)) T 
insO - (dnap D1NS0 . P1NS0, 
insl - (dna)? DINS] PINS1; 

30 

smax = -10000; 
if (endgaps) { 

for (coiO[0) = delylOJ = -msO, yy = J, yy < = Jenl; yy-f + ) { 
col0{yyj = deryfyyj = colO[yy-l] - insl; 
35 ndely[yy] = yy, 

} 

co!0[0] = 0; /* Waierman Bull Math Biol 84 */ 

} 

else 

4 0 for (yy = 1; yy < = Ienl; yy-f +) 

dely[yy) = -insO; 

/* fill in match matrix 
V 

4 5 for (px = seqxjO], xx = 1 ; xx < = lenO; px + + , xx + -f) { 

/* initialize First entry in col 
*/ 

if (endgaps) { 

if (xx = = 1) 

50 collfO) = delx = -(insO+insl); 

else 

coll[0} = delx = co!0[0] - msl; 
ndelx — xx; 



DAY 



} 

55 else{ 



col HO) = 0; 
delx — -insO; 
ndelx = 0; 



60 Page 2 of nw.c 



FIGURE 4E 

for (py = seqx[l], yy = J; yy < = Jenl; py-f + , yy + +) { 
mis = colOlyy-1); 
5 if (dna) 

mis += (xbm[*px-\V]&xbm[*py-'A'J)? DM AT : DMJS; 

else 

mis + = _dayI*px- , A')I*py-'A*J; 

10 /* update penally for del in x seq; 

* favor new del over ongong del 

* ignore MAX GAP if weighting endgaps 
*/ 

if (endgaps ) | ndely[yy] < MAXGAP) { 

if <coK)[yy} - insO > - dely(yy)) { 

dely[>y] = col0[yy] - (jnsO + insI); 
ndely|yy] = 1; 

} else { 

delyjyy] - = ins! ; 

2 0 ndely[yy) + +; 



15 



} else { 



} 



if (colOJyy] - {ms0-+insl) > = de!y[y>)) { 

delyfyy) = col0[yy] - (insO+insl), 
2 5 ndelylyy) - 1, 



} 



} else 

ndely[yyj + 4 , 



3 0 /* update penalty for del in y seq; 

* favor new de! over ongong dei 
*/ 

if (endgaps J | ndelx < MAXGAP) { 

if (collfyy -1] - insO > = delx) { 
35 delx = colJ|yy-J] - (jnsO+insJ), 

ndelx = I; 

} else { 

delx - = insl; 
ndelx + + ; 

40 } 

} else { 

if (co!l[yy-I]- (insO+insl) > = delx) { 

delx = coIi[yy-I] - (insO + msl); 
ndelx = 1; 

4 5 } else 



ndelx 4 +: 



} 



/* pick the maximum score; we're favoring 
50 * mys 0> er 3n y del and delx over dely 



55 



60 
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id xx - yy -f Jenl - I; 

if (mis > = delx &8c mis > = delyfyy]) 

co!l[yy] = mis; 
else ir (dclx > = dely(yy)) { 

coHfyy) = delx; 

ij = dx[id].ijmp; 

if (dAlidJ.jp. n[OJ && (fdna 1 1 (ndelx > = MAXJMP 
&& xx > dx[id] ,jp.x|ij) + MX) 1 1 mis > dx[id].score-f DINSO)) { 
dx[id) ijmp+ +; 
if > = MAXJMP) { 

wrttejmps(id); 
ij = dx(id) ijmp = 0; 
dxjjdj offset = offsei; 

offset + = sizeof(siruc! jmp) + sizeof( offsei); 

} 

} 

dx[id] jp n[ij) = ndelx; 
dx[id] jp x[ij] = xx; 
dx[id] score = delx, 

} 

else { 

coiijyy] - de!y(yy} T 
»j - dx[id] >jmp T 

if (dx[id] jp dIO) && <*dna | | (ndeiy[yy) > - MAXJMP 

Mw> dx[id) jp x|ij) + MX) I| mis > dx[id] scored DINSO)) { 
dx[id] ijmp-t- 4 , 
ir(++y >= MAXJMP) { 
wntejmps(id). 
ij = dxjid] ymp = 0, 
dx|id) offset = offset, 

offsei -f = sizeof(struct jmp) -f sizeof (offsei); 

} 

} 

dx|id]jp n[ij) - -ndelyjyy); 
dx[id] jp.x[nj - xx; 
dxjidj.score = dely[yy); 

} 

if (xx - - lenO && yy < Jenl) { 
/* fast col 
*/ 

if (endgaps) 

coll[yy]- = ins0+insl*(lenl-yy); 
if (co!l[yy) > srnax) { 

smax = coJUyy); 

dmax = id; 

> 

} 

} 

if (endgaps && xx < lenO) 

col I [yy- 1] -= insO + ins 1 * (lenO-xx); 
if (col](yy-j] > smax) { 

smax — coMlyy-l]; 

dmax — id; 

} 

tmp = colO; colO = coIJ; coll = tmp; 

> 

(void) free((char *)ndely); 
(void) free((char *)dely); 

(void) free((char *)co)0);(vojd) free((char *)co!3);} Page 4 of flW.C 
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/* 

* pr intO - only routine visible outside this module 

5 

* static: 

* getmatO - trace back best path, count matches: print{) 

* praiignO ~ print alignment of described in array p[J: prini0 

* dumpblockO - dump a block of lines with numbers, stars: pr_align0 
10 * numsO - put out a number line: dumpblockO 

* put}ine<) ~ put out a line (name, {num} 7 seq, [num]): dumpblockO 

* starsO - put a line of stars: dumpblockO 

* stripnameO strip any path and prefix from a seqname 
*/ 

15 

^include "nw h" 
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^define SPC 3 
^define P LINE 256 
^define PSPC 3 

exlern _day|26J|26]; 
int olen; 
FILE *fx, 

prim() 
{ 



/* maximum output line */ 

/* space between name or num and seq */ 



/* set output line length */ 
/* output file */ 



print 



int Jx 7 ]y r firstgap, lastgap; /* overlap =*/ 

if ((fx = fopen(ofile, "w")) - = 0) { 

fprinif<5tderr, w %s can't write %s\n*\ prog, ofile); 
cteanup( 1); 

} 

fpnntf(fx„ ** < first sequence: %s (length - %d)\iT, namexJOJ, )enO); 
fprmlf(fx r "< second sequence: %s (length = %d)\n", namex|]) T JenJ); 
olen = 60; ' 
Ix - lenO, 
Jy = lenj; 

firstgap = lastgap = 0; 

if (dmax < lenl - 1) { /* leading gap in x */ 
pplOj.spc = firstgap = lenl - dmax - I; 
ly -= ppfO) spc; 

> 

else if (dmax > Jenl - I) { /* leading gap in y */ 
pp[l] spc = firstgap = dmax - (lenl - I); 
Ix -= ppJJJ.spc; 

> 

if (dmaxO < lenO - ]) { /* trailing gap in x */ 
lastgap = lenO - dmaxO - 1 ; 
Ix -= lastgap; 

} 

else if (dmaxO > lenO -!){/* trading gap in y */ 
lastgap = dmaxO - (lenO - I); 
ly -= lastgap; 

} 

getmat(Ix, ly, firstgap, Jastgap); 
pr_ahgnQ; 
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/* 

* tiace back the best path, count matches 
*/ 

5 sialic 

getmat(lx, ]y, fustgap, lastgap) getmat 
int )x, )y; /* "core 77 (minus endgaps) */ 

int firstgap, lastgap; /* leading trailing overlap */ 

{ 

10 int nm, jO, il, sizO, sizJ ; 

char outx[32]; 
double pet; 
register nO, nl; 

register chair *pO, *pl; 

15 

/* get total matches, score 
*/ 

iO = il = sizO - sizl - 0; 
pO = seqxjO) + pp|I] spc; 
2 0 pi = seqx[I) + pp[0) spc; 

nO = ppflj.spc + I; 
nl = ppIOJ.spc + J; 

nm — 0; 

2 5 while ( *pO&& *pl ){ 

if (sizO) { 

pi + +; 
nl + +, 

sizO--; 

30 } 

else if (sizl) { 



35 } 

else { 



45 



n()4- +; 
sizl-, 



if (xbm|*pO-*A*]&xbm[*pl- , A , J) 

nm + -f ; 
if (nO-f- + = = PPIOJ-xJiO]) 
4 0 S i z 0 = pp [OJ.n[iO+ + }; 

if (n! + + =- pp(]}.x[sl]> 

sizl = pp{l).n[il + +]; 

pO+ + ; 
pi + +; 



} 



/* pet homology: 

* if penalizing endgaps, base is the shorter seq 
5 0 * else, knock off overhangs and take shorter core 

*/ 

if (endgaps) 

Ix = (lenO < ten I)? lenO : Jen I; 

else 

55 J* = (J X < Jy)? ] X : }y- 

pet = I00.*(double)nm/{double)lx; 
fprintf(fx, "\n"); 

fprintf(fx, " < %d match %s in an overlap of %d: %.2f percent similarity^" 
nm, (nm = = 1)? : "es w , Ix, pet); 

60 
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fprintf(fx, " <gaps in first sequence* %d", gapx); 
if (gapx) { 

(void) sprintf(outx T " (%d %s%s)", 

ngapx, (dna)? " base w : "residue'', (ngapx == J)? UU :Y); 

fprintf(fx," %s", outx); 



-.-getmat 
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fprintf(fx, gaps in second sequence: %d w ; gapy); 

jf (gapy) { 

(Yoid)sprinif(outx, 11 (%d %s%s) w , 

ngapy, (dna)? "base": "residue", (ngapy - — 1)?"":Y); 
fpFimf(fx," %s", outx); 



} 

if (dna) 



else 



fprinlf{fx, 

"\n< score: %d (match = %d, mismatch = %d, g3p penalty = %d 4 %d per base)\n", 
smax T DMAT, DMIS, D1NSO, DINSJ); 



fprintf(fx, 

"\n< score: %6 (Dayhoff PAM 250 matrix, gap penalty = %d + %d per residue)^, 
smax T PINSO, P1NS1), 
if (endgaps) 

fpnntf(fx, 

"< endgaps penalized left endgap- %d %s%s, right endgap: %d %s%sAn\ 



else 



fnstgap, (dna) 7 "base" : "residue", (fjrstgap = = 1) ? 
Iastgap, (dna)? "base" . "residue", (lastgap == ])? 

fpnntf(fx T " < endgaps not penalizedAn"); 
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static 


run; 


/* matches in core — for checking */ 


static 


)max; 


/* lengths of stripped file names */ 


static 


ij|2), 


/* jmp index for a path */ 


static 


nc|2} T 


/* number at start of current line */ 


static 


nil2}; 


1* current elem number -- for gappmg */ 


static 


51212); 




static char 


W); 


/* ptr to current element */ 


static char 


*po!2); 


/* ptr to next output char slot */ 


static char 


out{2}[P LINE); 


/* output line */ 


slatic char 


star(P LINE); 


/* set by starsO */ 
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/* 

* print alignment of described in struct path pp[] 
*/ 

static 

pr_align() 
{ 



int 
int 

register 



nn; 

more; 

i; 



/* char count */ 



pralign 
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for (i = O r Jmax = 0; i < 2; i+ +) { 
nn = stripname(namex[i]); 
if (nn > Jmax) 





Imax = nn; 


nc[i] - 


1; 


ro(>J = 


1; 


sizjij = 




ps[i} = 


seqxjij; 


po[i) = 


outfi); 
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for (nn — run — 0, more = J; more; ) { 

for (i = more = 0; i < 2; j+ +) { 
/* 

* do we have more of this sequence? 
*/ 

if(!* P s[ri) 

continue; 

more-f +; 



...pr_align 
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<PPl J } spc) { /* leading space */ 
*po[j)+-i- = * 
pplij.spc-; 

} 

else if (sizfij) { /* in a gap */ 
*po[i)++ = 
su[)]-; 



} 

else { 



/* we're pulling a seq element 
*/ 

*poJ0 = *p$I*J; 
if (islower(*ps|)])) 

*ps[j] = ioupper(*ps[iJ); 

pc?Ii)+ + ; 
ps|i]++; 
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} 



* are we at next gap for (bis seq*? 
*/ 

if(nj[i] = = pp[»J xhjliJJH 

/* 

* we need to merge a]] gaps 

* at this locaiion 
*/ 

siz[)J - pp[i] n|yli]+ +J; 
while (nilij == ppl)]-x[ijfi]]) 

sizli] += pp[i}.n[ij|i]-f-+]; 

} 

nj[i]+ + ; 



} 

if (+ +nn = = olen | j Imore && nn) { 

dumpblockO; 

for (i = 0, i < 2; i+ +) 

po[i] = out[i); 

nn = 0; 

} 
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* dump a block of lines, including numbers, stars: pr align0 
*/ 

static 

dumpblockQ 

{ 

register i; 

for (i = 0; i < 2; 

*po[i)~ = '\0'; 



dumpblock 
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...dumpblock 

(void) putcfW, fx); 
5 for (i = 0; 1 < 2;i++) { 

if (*owi[iJ && (*outli] " || *{po[iJ) != * ')) { 

>r (i == o) 

nums(i); 
if (i = = 0&& *ou*n]) 
10 siarsO; 

puOine(i); 

if (i == 0&& *out[]]) 
fprinff(fx, star); 

iro == i) 

1 5 noms(i); 

} 

} 

} 

2 0 /* 

* put out a number Hne: dumpblockO 
V 

static 

nuros(ix) 

2 5 int ix, /* index in out[] holding seq line */ 

char nJjne[P_LINE), 
register j, j; 
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register char *pn ? *px, *py; 

for (pn = nline, i = 0; i < lmax + P_SPC; pn++) 
*pn = ' *; 

for (i = ncjix], py = outjix); *py; py+ + , pn++) { 
if (*py == " || *py == *-•) 
*pn = ' '; 

else { 

if (i%io == 0 || (i = = } && ncfix] != ])) { 
) = (i < 0)? -i : i; 
for (px = pn; j; j /= 10, px-) 
*px = j%J0 + r 0*; 

if (i < 0) 

} 

else 



} 

} 

*pn = l \0'; 
50 iK[ix) = i; 

for (pn = nline; *pn; pn+ +) 
(void) putc(*pn, fx); 
(void) puicf\n\ fx); 

> 
/* 

* put out a Hne (name, [numj, seq, [num]): dumpblockO 
*/ 



static 

60 pu,J,ne(ix) llinc 
int ix; 
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mt j; 
5 register char *px; 

for (px = namex[ix], > = 0; *px && *px != V; px++, i-f +) 

(void) putc(*px, fx); 
for (; j < Jmax + PSPC; i + + ) 
1 0 (void) putc(* \ fx); 

/* these count from 1: 

* ruQ is current element (from I) 

* nc[] is number at start of current line 
15 V 

for (px = outfixj; *px; px+-f) 

(void) puic(*px&0x7F, fx); 
(void) putcOn\ fx); 

} 

20 



* put a Jine of srars (seqs always in out[0], out[}])- dumpb)ock() 
*/ 



...putline 



2 5 static 
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stars() stars 

{ 

inl j; 

register char *pO T *pl, cx, *px; 



if<!*oui|0] || (*omI0) = *(po|0]) == * T ) || 

»*out[I] [j (*oui|JJ = *(po[lJ) == * *)) 

return; 

px = star; 

3 5 for (i - lmax + P_SPC; i; i~) 

*px-f + = r 



for (pO = om\0], pi = out[IJ; *p0 && *pl; pO+ + T pl-f +) { 
if (isalpha(*p0) && jsalpha(*pl)) { 



if (xbm|*pO^A']&xbm[*pl-'A']) { 
cx = '*•; 
nm + +; 

} 

4 5 else if (!dna && _dayl*pO-*A')[*pI- WJ > 0) 

cx = V; 

else 

CX = ' 

} 

5 0 else 

cx = * 
*px -*--*- ~ cx; 

} 

*px-f + = '\n r ; 
55 *px = 'W; 
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/* 

* si rip path or prefix from pn T return Ien: pr_a!ignO 
*/ 

static 

stripname(pn) stripname 

char *prt; /* fi}e name (may be paih) */ 

{ 

regisler char *px, *py; 



py-0; 

for (px = pn; *px; px-f +) 
if <* px = = T) 

py = px + J; 

15 if (py) 

(void) sircpy(pn, py); 
return(5irlen(pn)); 



20 
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/* 

* ckanup() - cleanup any tmp fn*e 

* geiseqO -- read in seq T set dna r len, maxlen 
5 * g_caIloc() - ca)Joc{) whb error cbeckin 

* readjmps() -- gel the good jmps 7 from imp file if necessary 

4 wriiejmpsO ~ write a filled array of jmps to a tmp file: nv/() 
*/ 

^include "nw h n 
2 0 #mcJude <sys/f>Je b> 

char *jname = "/tmp/homgXXXXXX"; /* tmp Hie for jmps =7 

FILE *f); 

1 5 in! c!eanup<); /* cleanup tmp file */ 

long Jseek(); 

/* 

* remove any tmp fde if we blow 
2 0 V 

cieanop(i) cleanup 

int i r 

'f (0) 

2 5 (void) un]inJc(jname); 

exii(i); 

} 

3 0 * read, return ptr to seq, set dna, Jen. maxien 

* skip hnes starting with '< ' T or '>' 

* seq in upper or lower case 
*f 

char * 

3 5 getseq(f)Je, len) getseq 
char *fiJe; /* file name */ 



{ 



inl *)en; /* seq Jen */ 



char Jine[1024] r *pseq; 

3 0 register char *px, *py; 

>nt natgc, Uen; 

FILE *fp; 

if ((fp - fopen(fjJe,"r)) - - 0) { 

4 5 fprmtf(stderr,"%s: can't read %s\n n , prog, file); 

exit(J); 

} 

tlen = natgc = 0; 
while (fgets(iine, 1024, fp)) { 
S0 if Chne == ]) *line == T <' | j *)ine == *>') 

continue; 
for (px = line; *px != *\n f ; px-f +) 

if (isupper(*px) 1 1 islower(*px)) 
tlen+ ~f ; 

55 > 

if ((pseq = xnaIIoc((uns)gned)(tIen + 6))) = = 0) { 

fpnnif<stderr,"%s: maHocO failed to get %d bytes for %s\jT, prog, tlen-t-6, file); 
exit(l); 

} 

60 P$eq[0] - pseq[l) - pseq[2) = pseq[3J = *\0 r ; 
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py ~ pseq -f 4; 
*Jen = tlen; 
rew'md(fp), 

while (fgeis(Iine, 1024, fp)) { 

if (*)ine = *]ine = = '<* || *)ine= = * > ') 

continue; 

for (px = line; *px != r \n r ; px+ -t-) { 
if (isupper(*px)) 

*py -f + = *px; 
else if (islowe*(*px)) 

*py+ + = toupper(*px); 
if (index( u ATGCU w ,*(py-J))) 

natgc-f + ; 

} 

} 

*py++ = 'W\ 

*py - *\0*; 
(void) fclose(fp); 
dna = naigc > (ilen/3); 
returr»(pseq + 4), 



-..getseq 
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char * 

g_caJ)oc(msg T ax T sz) 

char *msg, 
in! nx T sz. 



{ 



char 



/* program, calling routine */ 
/* number and size of elements */ 



*px, *caHoc(); 



gjraHoc 
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if ({px - caUoc((unsigned)nx, (unsigr»ed)sz)) = = 0) { 
if (*msg){ 

fprintf(stderr, " %s: g_cal]oc0 failed %s (n= %d r sz = %d)\n" , prog, msg, nx, sz); 
exit(I); 

} 

} 

retunr>(px). 
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* get final jmps from dxQ or imp file, set ppQ, jesel dmax: mainO 
*/ 

readjrnpsO 
{ 

int fd = - ! ; 

int siz r iO t j] ; 

register i, j, xx; 

*(0H 



readjmps 



(void) fclose(fj); 

if ((fd - openOname, ORDONLY, 0)) < 0) { 

fprintf(stderT r "%s: can't openQ %s\n TT , prog T jname); 
deanup(I); 

} 

} 

for (i = iO = il = 0, dmaxO — dmax, xx = JenO; ; i+ +) { 
»hik (!) { 

for (j = dxldmaxj.ijmp; j > = 0 && dx(dmax) jp x[j) > = xx; j~) 
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...readjmps 

if 0 < 0 dx|drr)ax].offse! && fj) { 

(void) Iseek(fd, dx[dmax] offset, 0); 
5 (void) read{fd T (char *)&dx[dmax] jp, sizeof(stroct jmp)); 

(void) T?3d{fd, (char *)&dx[dmax) offset, siz eor(dx [dmax]. offset)); 
dx[drnax].ijmp - MAXJMP-1; 

} 

else 

1 Q break; 
} 

if (i > = JMPS) { 

fpnntf(stden\ " %s: loo many gaps in aligrunentVT, prog); 
cleanup(J); 

15 } 

if 0 > = 0) { 

siz = dx|dmax) }p-n[j); 
xx = dx|dmax] jp x(j), 
dmax 4- ~ sjz; 

2 0 if (siz < 0) { /* gap in second seq */ 

pp|JJ n|il] = -siz; 

XX + = SIZ, 

/* id = xx - yy 4 lenl - 1 
25 */ 

pp|l) x[iJ] = xx - dmax 4- lenl - I; 
gapy 4- + , 
ngapy - = sjz; 
/* ignore MAXGAP when doing endgaps */ 

3 0 s, z = (-siz < MAXGAP 1 1 endgaps)? -siz : MAXGAP; 

il + + , 

} 

else if (sjz > 0) { /* gap m first seq */ 
ppIO] n[iOJ = siz, 

3 5 pp [0] x[>0] = xx; 

gapx-f 4- ; 

ngapx 4- = sjz, 
/* ignore MAXGAP when doing endgaps */ 

siz - (siz < MAXGAP J | endgaps)? siz : MAXGAP; 
40 ,0+4-; 

} 

} 

else 

break; 

45 } 

/* reverse the order of jmps 
*/ 

for (j^O, )0-; j < i0; j + 4- , iO--) { 
50 i = PPfOJ-nO); pp|0].n[jj - pp|0].n[iOJ; ppIO] nJiO} - i; 

i - ppIOJ.xU); PPlO]-xQJ - pp[0] x[iO); pp[0) x[iOJ = i; 

} 

Tot Q = 0, il — - j < il; j++, il-) { 

i = ppfiJ-nUJ; PPliJ nUJ - pp[]J.nliIJ; pp|l).nliJ} - i; 
i = PPUUbl PPU) m = PPIIJ-Mj]]; pp[I]-x[i]) = i; 
} 

if (fd > - 0) 

(>oid) close(fd); 

if(0){ 

6 0 (void) unlin}c(jname); 

0 = 0; 

offsei - 0;}} Page 3 of nwsubr.c 
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■/* 

* write a filled jmp struct offset of the prev one {if any): nu<() 
V 

writejmps(ix) writejmps 

int ix; 

{ 

char *mJctempO; 



if (mhempOname) < 0) { 

fprinif(stderT, 11 %s: can't mkterrrpO %s\n w , prog, jname); 
cleanup(l); 

15 } 

if ((f) - fopcn(jname, "w")) = = 0) { 

fprintf(stderr, " %s* can't write %s\n", prog, jname); 
exn(l); 

} 

20 } 

(void) fwiite((ehar *)&dx[ix] jp T sizeof(struct jmp), I, fj), 
(void) fwnte((char *)&dx|jx] offset, s)zeof(dx[ix] offset), i, fj); 
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GTGCTCTCCGAGGACAAGCAGGAGGNGGTGGAGCTGGTGAAGCACCATGTGTGGGCTCTG 

GAAGTGTGCTACATCTCAGGCTTGGTCTTGTCGTGCTTACTCACCTTCCTGGTCCTGATG 

CGCTCACTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGAC 

TTGAGTCCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGC 

TTCAGTGCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTC 

TTCCTGGGAACCACGGCCCTGGGCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAAC 

CTCCTGCTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCT 

GTGATCCTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCA 

CAGCTGACCAACCGGCGAGTGCTCTATGGAGCCACCTTTCTTCTCTTCCCCCTCAATGTG 

CTGGTGGGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATC 

CACCTTGGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGGCGCCACTCTCGACCCCGGC 
TACTACACGTACCGAA 
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CACAACCAGCCACCCCTCTAGGATCCCAGCCCAGCTGGTGGTGGGCTCAGAGGAGAAGGC 
5 CCCGTGTTGGGAGCACCCTGCTTGCCTGGAGGGACAAGTTTCCGGGAGAGATCAATAAAG 
GAAAGGAAAGAGACAAGGAAGGGAGAGGTGAGGAGAGGGCTTGATTGGAGGAGAAGGGCC 
AGAGAATGTCGTCCCAGCCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACT 
CCTATGGCAGCTGGTACATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCGAGAGGGGG 
AAGTGCCCTCCTGCCACACCAGCATACCACCCGGCCTGTACCACGCGTGCCTGGCCTCGC 
10 TGTCAATCCTTGTGCTGCTGCTCCTGGCCATGCTGGTGAGGCGCCGCGAGCTCTGGCCTG 
ACTGTGTGCGTGGCAGGCCCGGCCTGCCCAGGCCCCGGGCAGTGCCTGCTGCTGTTTTCA 
TGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTGCCCGAGGAGGACGCATTGCCCTTCCTGA 
GTC7GGCCTGAGCACCCAGCCAAGATGGGAAAACTGAGGCTCGAAGAGGGGCCTGGAAGA 
TACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGTGCCACGGCTG 

1 5 ■ GCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGGCGACCTTGGGGTCCAGG 

TCTGGCAGAGGGCAGAGTGTCGCCAGGTGCCCAAGATCTACAAGTACTACTCCCTGCTGG 
CCTGCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTAGCCTGTGCAGCTGG 
TGAGAAGCTTGAGCCGTAGGACAGGAGGAGGCTGCAAGGGGCTGCAGAGCAGCTACTGTG 
AGGAATATCTGAGGAACCTCCTTTGGAGGAAGAAGCTGGGAAGGAGCTAGCACACCTCCA 

2 0 AGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTACACTCCACAGC 

CAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGGACGGCCATTT 
ACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACGCACTATGCAGAAGGTGAGGGCAG 
GGGTCACCAGGGATGTCTCCTACCTGCTGGGCGGCTTTGGAATCGTGCTCTCCGAGGACA 
AGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTGTGCTACATCT 

2 5 CAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTGCTGATGCGGTCACTGGTGAGAC 

ACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGTCCCTTGGATC 
GGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGTGCCTACCAGA 
CAGCCTTTATCTGGCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTGGGAACCACGG 
CCCTGGCCTTCCTGGTGCTCATGCCTGTGCTGCATGGCAGGAACCTCCTGCTCTTCGGTT 

3 0 CCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATCCTGCAGAACA 

TGGCAGCCCATTGGGTCTTCCTGGAGAGTCATGATGGAGACCCACAGCTGACCAACCGGC 
GAGTGCTCTATGCAGCCACCTTTCTTGTCTTCCCCCTCAATGTGCTGGTGGGTGCCATAG 
TGGCCACCTGGGGAGTGCTCCTCTCTGCGCTCTACAACGCGATCCACCTTGGGCAGATGG 
ACCTCAGCCTGCTGCCACCGAGAGCCGCGACTCTCGACCCCGGCTAGTACACGTACCGAA 

3 5 ACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGGCATGACAGCCTTCTGCTCCCTGC 

TCCTGCAAGGGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGACAGCCTCAGAC 
CAGGGGAGGAAGAGGAAGGGATGCAGGTGCTACAGACAAAGGACTCCATGGCCAAGGGAG 
CTAGGCCCGGGGCCAGCCGCGGGAGGGCTCGCTGGGGTCTGGGCTACAGGCTGCTGCAGA 
ACCCAAGCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCGAATGGTGCCCAGCCCT 

4 0 GAGGGCAGGGAAGGTCAACGCACCTGCCCATCTGTGCTGAGGCATGTTCGTGCCTACCAG 

CTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCGATGCAGCCAGCAGGTCCTCC 
GGATCACTGTGGTTGGGTGGAGGTCTGTCTGGACTGGGAGCCTCAGGAGGGCTCTGCTCC 
ACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAGAAACTGGTGGGTTAGGG 
CCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTCCCTACCCTGGC 

4 5 TCTGCCATCAGCCTTGAAGGGCCTGGATGAAGCCTTCTCTGGAAGCACTCCAGCCCAGCT 
CCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCTCACCCCCTCAG 
CGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGGCCTCTGGCCTGGAGGGCAG 
CCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCAGACTCGAGAGCCAGATAT 
TTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTCCCTGCAATAM 

50 CTTGTTCCTGAGAAAAA 
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Important features of the protein: 

Signal peptide: 
none 



Transmembrane domain: 

54-71 

93-111 

140-157 
25 197-214 

291 -312 

356-371 

425-444 

464-481 
30 505-522 



Motif name: N-glycosyl at i on site. 
8-12 

Motif name: N-myris toylation site. 



50-56 

167-173 
40 232-238 

308-314 

332-338 

516-522 

618-624 
45 622-628 

631-637 

652-658 

Motif name: Prokaryotic membrane lipoprotein lipid attachment 
50 site. 

355-366 

Motif name: ATP/GTP-binding site motif A (P-loop). 

55 
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FIGURE 10 
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FIGURE 12B 
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FIGURE 16 
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xJlliLl 

WT 601 315 316 317 319 328 329 
<n = 4> 

Hyperplastic Mammary Gland # 
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Figure 22 
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Figure 25 




Figure 26 
Stra6 / GAPDH 
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